Phonetic knowledge, phonotactics an automatic language id

نویسندگان

  • Martine Adda-Decker
  • Lori Lamel
  • Jacqueline Vaissiere
چکیده

This study explores a multilingual phonotactic approach to automatic language identification using Broadcast News data. The definition of a multilingual phoneset is discussed and an upper limit on the performance of the phonotactic approach is estimated by eliminating any degradation due to recognition errors. This upper bound is compared to automatic language identification based on a phonotactic approach. The eight languages of interest are: Arabic, Mandarin , English, French, German, Italian, Portuguese and Spanish. A perceptual test has been carried out to compare human and machine performance in similar configurations. Different phoneset classes have been experimented with, ranging from a binary C/V distinction to a shared phone set of 70 phones. Experiments show that phonotactic constraints are in theory able to identify a language (among 8) with close to 100% on very short sequences of 1-2 seconds. Automatic and human performances on very short sequences both remain below the theoretical performances.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The use of syllable phonotactics for word hypothesization

A search technique incorporating the automatic modeling of lexical variability is introduced for medium or large-vocabulary speaker-independent speech recognition. Current state-of-art systems depend on being able to model the entire language based on acoustic features and the constraints of syntax or interword probabilities. These methods often fail in the presence of multiple speakers, new vo...

متن کامل

Phonetic, phonemic, and phonological factors in cross-language discrimination of phonotactic contrasts.

Previous research indicates that multiple levels of linguistic information play a role in the perception and discrimination of non-native phonemes. This study examines the interaction of phonetic, phonemic and phonological factors in the discrimination of non-native phonotactic contrasts. Listeners of Catalan, English, and Russian are presented with an initial #CC-#CəC contrast in a discriminat...

متن کامل

A Language Independent Approach To Acquiring Phonotactic Resources for Speech Recognition

Building and developing linguistic resources for languages is of prime importance with many areas of application. This paper focusses on a fully automatic approach to the aquisition of a syllable phonotactics for a particular language. In this approach the phonotactic constraints for a language are encoded in a finite-state phonotactic automaton the structure of which can be automatically deriv...

متن کامل

The Effect of Using Phonetic Websites on Iranian EFL Learners’ Word Level Pronunciation

Computer-assisted language learning (CALL) is reaching an up most position in the pedagogical field of English as a Second or Foreign Language (ESL/EFL). The present study was carried out to study the effect of using phonetic websites on Iranian EFL students’ pronunciation and knowledge of phonemic symbols. Participants of the study included 30 EFL female pre-intermediate students studyin...

متن کامل

Incorporating linguistic knowledge into automatic dialect identification of Spanish

Automatic dialect identification, like automatic language identification , has often been approached through the use of phonetic frequencies and phonetic sequence modeling. While such statistical systems perform well on language identification problems, they are less adept at the more difficult problem of automatic dialect identification, particularly on short segments of speech. In this paper ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003